คู่มือการเขียนโปรแกรม CUDA: การจับคู่ฮาร์ดแวร์กับซอฟต์แวร์: เวอร์ชันความสามารถในการคำนวณ

ความสามารถในการคำนวณ (CC) ทำหน้าที่เป็นสะพานเชื่อมเวอร์ชันระหว่าง สถาปัตยกรรมเสมือน (PTX) กับ สถาปัตยกรรมจริง (SASS/ไบนารี) นักพัฒนาใช้ nvcc เพื่อเป้าหมายแพลตฟอร์มเฉพาะ ตั้งแต่แพลตฟอร์มเดสก์ท็อป/เซิร์ฟเวอร์ ไปจนถึงแพลตฟอร์มฝังตัว ภายใต้โมเดลระบบปฏิบัติการ เช่น Linux 64 บิต (LP64) หรือ Windows 64 บิต (LLP64).

1. สถาปัตยกรรมเสมือนกับสถาปัตยกรรมจริง

เครื่องมือพัฒนา CUDA รองรับสถาปัตยกรรม GPU จากสองเวอร์ชันหลักล่าสุด ซึ่งอ้างอิงใน ตารางที่ 29: การสนับสนุนคุณสมบัติข้ามความสามารถในการคำนวณ (7.5 ถึง 12.x). เราจะกำหนดการจับคู่โดยใช้แฟล็ก เช่น: nvcc --generate-code arch=compute_80,code=sm_90 prog.cu. สำหรับเป้าหมายที่เน้นอนาคต ใช้แฟล็ก เช่น nvcc -arch=sm_100 หรือแบบเฉพาะเจาะจง เช่น nvcc -arch=sm_100a ถูกใช้งาน

2. ลำดับชั้นของเมโคร

คอมไพเลอร์ใช้ __CUDA_ARCH__ เพื่อแยกทางโค้ด เมโคร __CUDA_ARCH__ จะถูกกำหนดเฉพาะในโค้ดอุปกรณ์ (เช่น __device__, __global__) ควบคุมได้ละเอียดมากขึ้น โดยใช้ __CUDA_ARCH_SPECIFIC__ และ __CUDA_ARCH_FAMILY_SPECIFIC__. คุณสมบัติบางอย่าง เช่น หน่วยความจำแบบแชร์แบบกระจาย หรือเฉพาะ ข้อมูลที่ไม่ใช่จำนวนจริง (NaN)ต้องการ ความสามารถในการคำนวณ 9.0+ หรือ ความสามารถในการคำนวณ 10.0 และใหม่กว่านั้น.

3. ข้อจำกัดด้านค่าตัวเลขและความจุ

ความแม่นยำแตกต่างกันตามระดับความสามารถในการคำนวณ; ตัวอย่างเช่น การจัดการค่าเล็กมาก (subnormal) รับประกันว่า $2^{-16382} \approx 3.36 \cdot 10^{-4932}$ ข้อจำกัดด้านฮาร์ดแวร์ เช่น CUDA_DEVICE_MAX_COPY_CONNECTIONS=16 หรือ คำสั่ง .maxnreg PTX จะถูกบังคับใช้อย่างเข้มงวดตามเวอร์ชันความสามารถในการคำนวณเป้าหมาย

TERMINALbash — 80x24

> Ready. Click "Run" to execute.

QUESTION 1

Where is the __CUDA_ARCH__ macro strictly defined?

In both host and device code to identify the GPU hardware.

Only in device code (__device__, __host__ __device__, or __global__).

Only in the host-side main() function.

In the NVML API headers only.

QUESTION 2

Which command correctly targets a virtual arch of 8.0 and a real arch of 9.0?

nvcc -arch=sm_80,code=compute_90

nvcc --generate-code arch=compute_80,code=sm_90

nvcc --target=cc_8.0_9.0

nvcc -arch=sm_90

QUESTION 3

What is the consequence of declaring 'namespace cuda { struct foo; }' in CUDA code?

It enables high-speed memory access.

It is an error; the 'cuda' namespace is reserved.

It is required for using cuda::std::result_of.

It allows the use of __nv_atomic_load.

QUESTION 4

Which mathematical property is associated with CC 9.x and higher?

Support for 16-byte data types.

Basic support for fabs(x) and sin(x).

Introduction of the warpSize macro.

Ability to use x + y in kernels.

QUESTION 5

What happens when an extended lambda is defined inside a generic lambda using Microsoft Visual Studio host compilers?

It compiles successfully and inlines perfectly.

The host compiler may fail to inline or throw an error.

It enables __managed__ memory by default.

It triggers the on-disk JIT compilation cache.